{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Classification (1) – an issue with distance measures, and an implementation of Nearest Neighbour classification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook we will expand on some of the concepts of \n",
    "classification, starting with an experiment with distance measures on data, then looking into the $k$-Nearest Neighbour algorithm. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1) Distance measures for high-dimensionality data\n",
    "\n",
    "Algorithms such as $k$-Nearest Neighbour are conceptually very simple -- we predict the class value of an unlabelled *query* data point we are given by looking at all the labelled data point(s) in our data set, and predicting that our query will have the same class as the most similar data point(s) in the training set. So, all we need is a way of measuring similarity. The well-known *Euclidean distance measure* would seem to be a good choice. However, while we are very familiar with Euclidean distance in 2 and 3-dimensions, there was a warning (Slide 62 of the \"Classification (1)\" lecture) that in high-dimensions there is a problem – what was this problem ? "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pairwise distances in high-dimensional spaces \n",
    "\n",
    "**Answer**: in high-dimensional spaces everything is far away from everything else, and so pairwise distances become uninformative."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But what does this actually mean ? There is a mathematical argument to show that this is a true statement, but an alternative approach is simply to simulate what happens. One approach is to randomly generate $N$ points inside a $d$-dimensional cube centred around zero, such as $[-0.5, 0.5]^{d}$. Now we calculate the pairwise distances among the $N$ points.  After that for every data point we calculate the ratio of the minimum distance to the maximum distance  to all of the other data points. The mean ratio represents the average range of pairwise distances there are in that dimensionality. We run the simulation from 1 dimension to 1000 dimensions and the ratios will be plotted on a line chart using the ``` matplotlib ``` library. \n",
    "\n",
    "You should use the ```numpy``` library for this, and in particular the linear algebra methods to calculate distances such as the [L2 norm](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linalg.norm.html#numpy.linalg.norm). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEWCAYAAABi5jCmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzt3Xd8XNWZ//HP15Is2bIs23Jvkg2m2KaLFggJhCSEFJIsCZCQBruQzZKETdiEZLOEsNlN25Ql4beEADGhhrTFEFhIAZMGSKYa0wxI7lWy5CZb5fn9cY7skVC5MhqNZuZ5v17zmpnb5jkzd86595x7z5GZ4ZxzznUakekAnHPODS9eMDjnnOvCCwbnnHNdeMHgnHOuCy8YnHPOdeEFg3POuS5yomCQ9HVJmyWtj+/fJ2mVpO2SjspgXH3GEafPzURsAyXpjZJeGKLPulLSLfH17Pg9FQzFZych6VpJ/zYEn1Mn6fT4+suSrk/3Zw6lwUrTUO2bkkzSgfH1kOwDmaJsuI9BUh0wBWhPmbzIzC6RNAt4Eag0s41x+ZeBz5nZXa/zcw2YZ2Yr9nP9QYkj30i6EjjQzM7PdCyZFPf7vzez33ebXgW8ChSZWdvQR5af9ic/6O03HO4KMx3AALy7ly+3EtjSWSikTHt2aMLq05DHIanQMwuX7Xw/zjAzG/YPoA44vYfppwO7gA5gO3B7fDZgB/ByXG468CtgE+FI6zMp2ygAvgy8DGwDlgKzgIdTtrMdOKeHzx8BfAWoBzYCPwPKgeKe4uhhfSMcGQMsAq4BfhvjeBQ4IGXZBcDvgAZgA/DlOP1K4JfALUAz8PcxrstjmrYAdwITUrb1C2A90BTTuSBl3pnA8hjDGuCyOP3NwOpuv8llwNNxOz8HSlLmfwFYB6yNMe1Naw/fwxxgSfzM3wE/Am6J86riuoXx/UPA14G/xu/4bqACuDWmvwaoStn2ISnf2wvAB1Pm9fqdAwK+H3/XppjOhSnrfT1lO/8ArIifsRiY3u03/iTwEtAYP6/zTP0A4I/xN9oc0zCup/0+/s6d38nKuN3t8fGm+NmHpaw7mfDfmJR0v+32fX8sfs5m4F/7+G8uAq6N3/G2+DtWpsz/b2BV/G2WAm9MmZeaps7PvTB+7sPATcDn4/wZcf6n4vsDY5rFa/fNLxL23W3xN39LSrp7/V/0kLZ/Yd8+fAGv/b9+Pb6eCNwDbI0x/Sl+1s2EvGlX/J2+kOD/t4j9ywd6TRtQQsgftsQYa4Apfea5mczwkz7opWDoKcNK+TN2/oAj4g55BTASmAu8Arw95cd/Bjg47mRHABXdt9PLZ19AyBDmAmOAXwM39xRHL+t339EagOMIZ3K3AnfEeWVxB/18/JHLgONT/lytwHtjWkcBlwKPADMJhdSPgdu7xV0W5/0AeDJl3jrinxcYDxzd0/ccf5PHCIXuBOA54JNx3hlxx18AjCb8QfoqGP4GfC/GcwrhD9FXwbCCkKmWEwqxFwkHCYWETO6ncdlSQqb0iTjvaEJGtyDBd/52wn4zLu4XhwLTesgUTovbPDrG/0Pg4W6/8T1xO7MJBydnpGRub43rTSJkEj/oab+n50y0MGXZ/wd8K+X9Z4G7B7rfpmz7J4R96QhgN3BoL9taFH+vU2I6/hv4c8r88wkFdyFh/11PPIDoJU0/i7/bqBjn3XH+hwiZ3s9T0nBX932T8D9eRSyc43Y7C/s+/xfd0nUGIeNdGOO5jd4Lhm8QCsei+Hgj+wr/vb9hwv/fIvYvH+g1bcDFhAOo0YQD4WOAsX3muenM0AfrEb/c7YTSrvPxDz1lWD1kuMcDK7vN/xL7Mo8XgLN6+dz+MvY/EI9gUnbKVvZlYgMtGK5PmXcm8Hx8fR7wRC/buJKUjChOe454lBTfT0uNq9uy42IcnUeMK+OONLbbcl2+5/ibnJ/y/tvAtfH1jcA3UuYd2Nt3Qcgs24DSlGm30XfB8K8py34XuC/l/buJfzTgHOBP3T7vx8BXE3znpxEKnBOAEd22sYh9mcINwLdT5o2J33VVym98csr8O4HLe/kt35v6OzOwguF4QoY4Ir6vJeXsKOl+m7LtmSnzHwPO7WVbi4gZV0r624FZvSzfCBzRR5rmpix7AOG/PoKQ8V7MvgLgJkL7HXQtGA4knAWdTmiD2d//xY3AN1PeH0TvBcNVwF30vH/v/Q17+T66//8WsX/5QK9pIxREfwUO7y2O7o9suirpvWY2LuXxk4TrVQLTJW3tfBCqjqbE+bMIRyL7YzrhdLxTPeGHmNLz4v1an/J6J+FPBv3HuKrb+0rgNynpfY7wZ50iqUDSNyW9LKmZsONCOB0G+DvCzlgvaYmkE/cj3undYuoeX6rpQKOZ7UiZVt/bwtGGlNe7enjfGUclcHy33/7DwNT+0mBmfyRUaV0DbJB0naSxvcS/N14z2044ZZ/R32dImizpDklr4m9xC/t+hwExs0cJ1ZZvknQIIYNc3MviSfbb3n7bnuz9fWP6G+JnIOnzkp6T1BS//3L6TmPqtl4mHBAeSTgKvwdYK+lgQvXZku4rW2gYvpRQ6GyM3+/0OLvX/0UPcXTfh/vaJ79DOAN7QNIrki7vbcEE/z/Yv3ygr7TdDNwP3CFpraRvSyrqIz1ZVTDsr1XAq90KlTIzOzNl/gH7ue21hB+kU+fR74aeF99v/cVoPSz/jm5pLjGzNYRT8rMIR1TlhCM1CNUlmFmNmZ1FqKP+X8IR7kCtI5zSdprVz7LjJZWmTJu9H5/Zk1XAkm7fwxgz+8ckK5vZ1WZ2DKFK7CBCtWN3XfaBmI4KQh13f75B+O0ON7OxhGoXJQmtl+k3xW18BPilmbX0stxg77d7f19JYwhVi2slvZFQ3/9BYLyZjSPUq/eVxu5pWwKcDYyM++8S4KOEas4ne9yA2W1mdjIhjQZ8K87q63/R3Tq67re97pNmts3MPm9mcwlnrJ+T9JZe0tPn/68ffeUDvabNzFrN7GtmNh94A/AuwnfYq3woGB4DmiV9UdKoWGIvlHRsnH898O+S5ik4XFJFnLeBUA/bm9uBf5Y0J/4h/pNQBzrYV1PcA0yVdKmkYkllko7vY/lrgf+QVAkgaZKks+K8MkKd8RZCneN/dq4kaaSkD0sqN7NWQoNhOwN3J/AJSYdKGk1o3+mRmdUTqj2+Fj//ZMKfazDcAxwk6SOSiuLjWEmH9rdiXO74eGS1A2ih5+/iNkJaj5RUTPg+HzWzugTxlRGrSCXNoOeCpyebCI2a3ffNm4H3EQqHn/Wx/mDvt2dKOlnSSODfCelfRUhfW4y3UNIVQE9nXX1ZAlxCaH+BUJX4aUI7xmt+D0kHSzot/hYthDPIzuX6+l90dyfwcUnz4z781d4ClPQuSQdKEvv+M52f2T0P6fX/l0Bf+UCvaZN0qqTDFO4FaiZUMfX5v86mguFuhRudOh+/SbJS3HneTTgdfZXQUHg9obSG0Oh5J/AA4Uu7gdDwBeF09KZ4evbBHjZ/I+HP+HDcdgthpx1UZraN0Ej5bsJp5kvAqX2s8t+EaoQHJG0jNEp17kA/I5wWryE03D7Sbd2PAHXxNPeThExmoPHeB1wNPEg4xf5bnLW7l1U+FONrIPwB+8rUBhLHNuBtwLmEo+T1hKPH4gSrjyU0wDYSvq8twH/18Bl/AP6NcNXbOsIR3bkJQ/waodG6iXAVyq+TrGRmO4H/AP4S980T4vTVwOOEo9Q/9bGJwd5vbyP8bg2Ehs0Px+n3A/cR2mrq4+f0Va3YkyWEzLSzYPgzIUN9uJfli4FvEv7n6wlnvl+O8/r6X3QR9+EfEK4aWxGfezMP+D2hkP8b8P/M7KE47xvAV+LvdBn9//961U8+0FfaphKuXGwmVDEtIVRb9iorbnBz2S0eoS8DitNwNuVSSLoRWGtmXxmiz1tEaPgdks9zQyObzhhcFlHoDmSkpPGEo/S7vVBIL4U7ot9POOt1br95weDS5WJC3fLLhPrMRA2+bv9I+nfCWdl3zOzVTMfjsptXJTnnnOvCzxicc851kU2d6AEwceJEq6qqynQYzjmXVZYuXbrZzCYlWTbrCoaqqipqa2szHYZzzmUVSf31KLCXVyU555zrwgsG55xzXXjB4JxzrgsvGJxzznXRb8EgqVTSiPj6IEnv6a/LVuecc9kryRnDw0BJ7P3xD4TRsBalMyjnnHOZk6RgUOzN8f3AD83sfcD89IblnHMuU5LcxyCFUbw+TBioO+l6zjnnBqCltZ2mXa37Hjtbu7x/y6GTOXzmuLTHkSSDv5QwRvJvzOxZSXMJ/ew755zrpqW1neZdXTP0zsfWmNH3Nn93W0ef255UVjw8CgYzWwIs6Rx60cxeAT6T7sCccy6TWts7aNy5h4Yde2jYvoeGztc79uzNyLtn8Ft39p+5lxUXMnZUEeXxccCkMYwbHV6nTk99jBtdRFlJEQUjkowA+vr1WzDEaqQbCANSz5Z0BHCxmX0q3cE559xgMDN27mnfm7E37NjDlh17aIzPDTt207CjNT6H+c0tvQ8fMqa4MCUjL2TuxDEhE+8lgx8Xn8tKCiksGP53CSSpSvoB8HbCsHGY2VOSTkmycUlnEIacKwCuN7Nvdps/mzCA+bi4zOVmdm/y8J1z+aijw9i6q7VLht41o3/to7cj+aICMaF0JONHj6RizEgOGz+OCaOLmFBazIQxI5kweiQTSsO88aNHMm50EUVZkLm/Hokakc1sVRjneq9+B4iPA09fQxijdDVQI2mxmS1PWewrwJ1m9j+S5gP3AlUJY3fO5Zi29g42b9/DhuYWNm7bzcZtLWxsTn0Orzdv30N7R89jyYwpLmRCacjMp4wt4dBpY6koHcn4OK3zdUV8P6a4kG75W95LUjCskvQGwCSNJLQvPJdgveOAFbFNAkl3AGcRBsDuZIRB1wHKCQO2O+dyzO62djZtixl7Z6YfM/wNMcPftK2FLTv20H3sMAkqSkcyuayEyWOLOXRaGZPLSpg4pjODL95bEIwvLaK4sCAzicwhSQqGTxKqg2YQjvwfAP4pwXozgFUp71cDx3db5krgAUmfBkqB03vakKSLgIsAZs+eneCjnXNDYdee9nA0HzP61CP9Tdv2vd+6s/U16xaMEBPHhAx/enkJR84ax+SyYiaPLWZyWQlT4nPFmJE5X3Uz3CS5Kmkz4R6Ggerp3Kz7ud95wCIz+25s5L5Z0kIz61IZaGbXAdcBVFdX+1ikzg2BbS2trG9qYV1Ty77n5l1d3jftem2GX1QgJo0pZvLYEqoqSjluzoQuGf2kmPlXlBYP2VU2bmCSXJV0E/BZM9sa348HvmtmF/Sz6mpgVsr7mby2quhC4AwAM/ubpBJgIrAxWfjOuYEyM5p2tXbN8Jtiht+8ryDYvvu1V+VMHFPMtPISZk0YzXFzJjBlbAmTy4rDc8z4x40qYoRn+FktSVXS4Z2FAoCZNUo6KsF6NcA8SXOANcC5wIe6LbMSeAuwSNKhQAmwKVHkzrnX6OgwGnbueW2Gv/eIv4V1Tbtoae16hc4IweSyEqaWlzBv8hjeOG8i08pLmFo+KjzHjN/r7/NDkoJhhKTxZtYIIGlCkvXMrE3SJcD9hEtRb4x3Tl8F1JrZYuDzwE8k/TOhmunjZt2bnpxzncyMLTv2UL9lB/VbdlK3ZScrt+xgzdZQAGxs3s2e9q6ZfuEIMWVsCdPKS1gwfSynHzp5X4ZfHqZPGlOcFdfXu6GRpGD4LvBXSb+M7z8A/EeSjcd7Eu7tNu2KlNfLgZOShepcfujoMNY3t1C/ZSf1W3aEzL9hB3Wbd7KyYWeXKp4Rgmnlo5gxfhTVleNfk+FPLS9hYmmxV+24AUly5P8zSUuBUwkNyu/vdi+Cc26AWts7WLt1F3Ux869PfW7YyZ6Um7GKCsSs8aOprAj1+pUVo6mqKGV2xWhmjh/l1Ttu0CXtJfV5oLFzeUmzzWxl2qJyLge0tLazqmFnl8y/bssOVjbsZHXjri43aJUUjaCqopQ5E0s59ZDJVFaMpnJCKZUVo5k+bpRfveOGVJKrkj4NfBXYQLjjWYT2gMPTG5pz2WHz9t0sW9PEc+u2Ubd5B/UNoRBY19TSZbmykkKqKko5bEY57z58OrPjkX9VxWgmlRX73bdu2EhyxvBZ4GAz25LuYJwbzsyMDc2hEFi2tolla5pZtqaJ9c37CoCJY4qprBjNiQdUUDmhlKqJo6msKKVywmjGjS7yzN9lhURdYgBN6Q7EueHEzFjduItnOwuAtU0sW9PE5u17gNBNwwGTxnDC3AksnFHOgunlzJ8+lvJRPhy6y35JCoZXgIck/RbY3TnRzL6XtqicG0IdHUZ9w86UM4FQGHTe1VswQsybPIY3HzyZhdPHsnBGOYdOG0tpsQ9k6HJTkj17ZXyMjA/nslZ7h/HKpu1dqoKWr21mW7wEdGTBCA6eWsaZh01lwfRyDptRzsFTyygp8it/XP5Icrnq14YiEOcGW2t7By9tCIXAs2uaeCY2EO9qDb3GlxSN4NBpY3nvUTNYOCOcCcybXMbIQr/Ry+W3JFclTQK+ACwgdFkBgJmdlsa4nBuwjdtaePjFzTy+spFla5p4fv22vfcDlI4sYMH0cs49bhaHzShn4Yxy5k4s9bt9netBkqqkW4GfA+8idMH9Mbw/IzcM7GnrYGl9I0te3MTDL25i+bpmAMaWFLJwRjkff0MVC2eUs3D6WKoqSv3uX+cSSlIwVJjZDZI+a2ZLgCWSlqQ7MOd6snLLTpa8uJElL27mby9vZseedgpHiGMqx/OFMw7mTQdN4tCpY70QcO51SFIwdHa4vk7SOwldZ89MX0jO7bNzTxuPvLKFJS9s4uGXNvPq5h0AzJowivcdPYNT5k3ixAMqKCvxy0SdGyxJCoavSyon9IT6Q8JQnJemNSqXt8yMFzZs4+EXN7HkxU3UvNrInvYORhUVcMLcCXzsxEredPBkqipG+81izqVJkoKh0cyaCDe5nQogyXtEdYNm6849/HnF5nhWsIkNzeF2mYOnlPHxk6o4Zd4kqqvG+yWjzg2RJAXDD4GjE0xzLpH2DuOp1Vv3nhU8tWorHRYajd84bxJvOmgSbzxoItPKR2U6VOfyUq8FQxyD+Q3AJEmfS5k1ljDwjnOJbWhuYUksCP780maadrUiwREzx/Hp0+ZxykGTOGJmuV8+6tww0NcZw0hgTFymLGV6M3B2OoNy2W93Wzu1dY17zwqeX78NgMllxbxt/hROOWgSJx84kfGlfjO9c8NNrwVDyqWpi8ysHkDSCGCMmTUPVYAuuzy3rplbHqnnrifXsn13GyMLRlBdNZ4vveMQTjloEodMLfNGY+eGuSRtDN+Q9EnCWAxLgXJJ3zOz76Q3NJctWlrbuW/ZOm55ZCVL6xspLhzBuw6fzpmHTeWEuRXe2ZxzWSbJP3a+mTVL+jBh/OYvEgoILxjy3MotO7n1sXp+Ubuahh17mDOxlK+881DOPmYm40Z7FZFz2SpJwVAkqQh4L/AjM2uVZP2t5HJTe4fxx+c3cssj9Tz80iZGSLz10Cmcf0Ilbzigwu84di4HJCkYfgzUAU8BD0uqJDRAuzyycVsLd9as4rZHV7K2qYUpY4v5zGnzOO+42UwtL+l/A865rJGk2+2rgatTJtVLOjV9Ibnhwsx45JUGbnm0nvuXraetwzjpwAquePd83nLoFIr80lLnclJf9zGcb2a3dLuHIZWP4Jajmlta+fXS1dzy6EpWbNxO+agiPvaGKj58/GzmThqT6fCcc2nW1xlDaXwu62MZl0OWrWnae6nprtZ2jphZznfOPpx3HzHdu6NwLo/0dR/Dj+Ozj+CWw1pa27nn6XXc/Eg9T63aSknRCM46Ygbnn1DJYTPLMx2ecy4D+qpKurq3eQBm9pnBD8cNlVc37+DWR+r5xdLVNO1qZe6kUq5413z+7uiZlI/2Lqydy2d9VSUtjc8nAfMJo7gBfCBlnssibe0d/P65cKnpn1dspnCEePuCqXz4hNmcOLfC70h2zgF9VyXdBCDp48CpZtYa318LPDAk0blBsaG5hdsfW8kdj61ifXML08pL+NxbD+LcY2cxeaxfauqc6yrJfQzTCQ3QDfH9mDjNDXPPr2/mv3//Eg8s30B7h3HKQZO46qwFnHbIZO/F1DnXqyQFwzeBJyQ9GN+/CbgybRG5QfG75Rv4zO1PMLJwBBeePIcPHTebqoml/a/onMt7SW5w+6mk+4Dj46TLzWx9esNy+8vMuP5Pr/Kf9z3HYTPKuf6j1V5d5JwbkETdXsaC4K40x+Jep9b2Dq64axm3P7aKdyycyvc+eCSjRvr9B865gfH+kHNE085WPnXbUv6yYgufevMBXPa2g71DO+fcfvGCIQfUb9nBJxbVsKphJ985+3A+UD0r0yE557JYooJB0snAvNjeMIkwitur6Q3NJfHYqw1cfHMtBtx84fGcMLci0yE557JcvwWDpK8C1cDBwE+BIuAWwo1vLoN+/fhqLv/VM8wcP4obPn4sc/yqI+fcIEhyxvA+4CjgcQAzWyvJO9bLoI4O43u/e5EfPbiCE+dW8D/nH+0jpjnnBk2SgmGPmVnnqG2S/LA0g1pa2/n8L57it0+v45zqWfz7excystBvVnPODZ4kOcqdkn4MjJP0D8DvgZ8k2bikMyS9IGmFpMt7WeaDkpZLelbSbclDzz8bt7VwznWPcO8z6/jSOw7hm393mBcKzrlBl+QGt/+S9FbCcJ4HA1eY2e/6W09SAXAN8FZgNVAjabGZLU9ZZh7wJeAkM2uUNHk/05Hznl/fzIWLamnYsYdrzz+Gty+YmumQnHM5KknjcynwRzP7naSDgYMlFXV2qteH44AVZvZK3M4dwFnA8pRl/gG4xswaAcxs4/4kItc9+PxGLrntccaUFPKLT57Iwhk+ToJzLn2S1EM8DBRLmkGoRvoEsCjBejOAVSnvV8dpqQ4CDpL0F0mPSDqjpw1JukhSraTaTZs2Jfjo3LHoL69y4U01VE0s5a5/OtkLBedc2iUpGGRmO4H3Az80s/cRxmfod70eplm394XAPODNwHnA9ZLGvWYls+vMrNrMqidNmpTgo7NfW+ze4sq7l/OWQ6dw58UnMrXc+zxyzqVfkquSJOlE4MPAhQNYbzWQegvuTGBtD8s8EqulXpX0AqGgqEmw/ZzV3NLKp297giUvbuKiU+byxTMOocC7t3DODZEkZwyXEhqIf2Nmz0qaCzzYzzoQMvd5kuZIGgmcCyzutsz/AqcCSJpIqFp6JWnwuWhVw07O/p+/8pcVm/nm+w/jy2ce6oWCc25IJbkqaQmwJOX9K0C/4z2bWZukS4D7gQLgxliwXAXUmtniOO9tkpYD7cC/mNmW/UtK9lta38jFN9eyp62Dmy44jpMOnJjpkJxzeUhm3av94wzpB2Z2qaS7eW3bAGb2nnQH15Pq6mqrra3NxEen1eKn1nLZL55iWnkJN3zsWA6cPCbTITnncoikpWZWnWTZvs4Ybo7P//X6Q3K9MTOu/sMKvv/7FzmuagLXfuQYJpR69xbOuczptWAws6XxZQGhgXjn0ISUP1pa27n8V0/zv0+u5f1Hz+Ab7z+M4kIfWMc5l1lJri76OHCtpC3An+Ljz503pbn9s2X7bi6+eSm19Y38y9sP5lNvPgDJG5mdc5mXpPH5owCSpgNnE7q5mJ5kXdezlzZs44KbatjYvJtrPnQ07zx8WqZDcs65vZJ0iXE+8EbgMGAz8CPCWYPbD396aROfuvVxigsL+PnFJ3LkrNfcz+eccxmV5Kj/B8DLwLXAg2ZWl9aIctgtj9Tz1cXPMm/yGG74+LHMGDcq0yE559xrJKlKmihpAXAK8B+xR9QXzOwjaY8uR7R3GP/x2+e48S+vctohk7n6vKMYU+w1cc654SlJVdJYYDZQCVQB5UBHesPKHWbGJbc9zn3L1vOJk6r4yjvn+53MzrlhLclh659THj8ys9XpDSm31G/ZyX3L1vNPpx7Av7z9kEyH45xz/eqzYIiD7TxgZpcNUTw5p6auAYD3Htm9x3HnnBue+uxEz8zagaOGKJacVFvXyLjRRRwwybu4cM5lhyRVSU9KWgz8AtjROdHMfp22qHJITX0D1ZXjGeHtCs65LJGkYJgAbAFOS5lmgBcM/diyfTevbNrBB6tn9b+wc84NE0kuV/3EUASSi2rrQ68hx1aNz3AkzjmXXK8Fg6QvmNm3Jf2Qnrvd7ndMhnxXW9fAyMIRPk6zcy6r9HXG8Fx8zr3BD4ZITV0jR84c5z2mOueySl/dbt8dn28aunByx6497Sxb08RFp8zNdCjOOTcgSe58ngR8EZgPlHRON7PTel3J8eSqrbR1GMdWTch0KM45NyB93scQ3UqoVpoDfA2oA2rSGFNOqK1rQIKjZ3vDs3MuuyQpGCrM7Aag1cyWmNkFwAlpjivr1dQ3cvCUMspHF2U6FOecG5AkBUNrfF4n6Z2SjgJmpjGmrNfeYTxe30i1X6bqnMtCSW5w+7qkcuDzwA+BscA/pzWqLPf8+ma2727z9gXnXFZKcoPbPfFlE3BqesPJDbV14ca2ai8YnHNZqN+qJElzJd0tabOkjZLukuTXYPahpq6B6eUlPkKbcy4rJWljuA24E5gKTCd0pnd7OoPKZmZGTV2Dny0457JWkoJBZnazmbXFxy300EWGC1Y37mJD827vH8k5l7WSND4/KOly4A5CgXAO8FtJEwDMrCGN8WWd2vrwdfgZg3MuWyUpGM6Jzxd3m34BoaDw9oYUNXWNlJUUctCUskyH4pxz+yXJVUlzhiKQXFFb18AxleMp8IF5nHNZKkkbg0to6849vLhhu9+/4JzLal4wDKKlcWCe6kpveHbOZS8vGAZRTV0jRQXiiFnjMh2Kc87ttyQ3uF3Y7X2BpK+mL6TsVVvXwGEzyikp8oF5nHPZK8kZw1sk3StpmqSFwCOAX3LTTUtrO0+vbvL2Bedc1ktyVdKHJJ0DPAPsBM4zs7+kPbIs88yaJva0d/j9C865rJekKmke8FngV4RBej4iaXSa48o6NXXhxrZjvOHZOZflklQl3Q38m5ldDLwJeAkfwe01ausaOXDyGCaUjsz6HgXsAAAWvklEQVR0KM4597okufP5ODNrBjAzA74raXF6w8ouHR1GbV0D7zx8WqZDcc651y1JG0NzbHSeD5SkzHopbVFlmZc2bqe5pY3qSm9fcM5lvyRtDF8ljNz2Q8JAPd8G3pNk45LOkPSCpBWxI77eljtbkkmqThj3sNLZvuBXJDnnckGSNoazgbcA683sE8ARQHF/K0kqAK4B3kE42zhP0vwelisDPgM8OoC4h5XaugYmlxUza4IPzOOcy35JCoZdZtYBtEkaC2wkWY+qxwErzOwVM9tD6Lb7rB6W+3fCWUhLwpiHnZq6Ro6tmoDkHec557JfkoKhVtI44CfAUuBx4LEE680AVqW8Xx2n7SXpKGBWyrjSPZJ0kaRaSbWbNm1K8NFDZ+3WXazZuotqH5jHOZcjkjQ+fyq+vFbS/wFjzezpBNvu6fB578hvkkYA3wc+niCG64DrAKqrq4fV6HG1seM8b19wzuWKJJerIulwoKpzeUkHmtmv+1ltNTAr5f1MYG3K+zJgIfBQrIKZCiyW9B4zq00U/TBQW9dA6cgCDpnqvYQ453JDvwWDpBuBw4FngY442YD+CoYaYJ6kOcAa4FzgQ50zzawJmJjyOQ8Bl2VToQChfeHoyvEUFnhHtc653JDkjOEEM3vN1UT9MbM2SZcA9wMFwI1m9qykq4BaM8v6m+SaW1p5fn0zl77loEyH4pxzgyZJwfA3SfPNbPlAN25m9wL3dpt2RS/Lvnmg28+0x+sbMYNjveHZOZdDkhQMNxEKh/XAbkKjspnZ4WmNLAvU1jVSMEIcOdsH5nHO5Y4kBcONwEcI3W539LNsXqmpa2Dh9LGMHpmoDd8557JCkhxtZS60Bwy2PW0dPLlqK+efUJnpUJxzblAlKRiel3Qbofvt3Z0TE1yumtOWrW1id1uHty8453JOkoJhFKFAeFvKtCSXq+a02r0D8/iNbc653NJrwSDpPOCB2HGe66amrpE5E0uZVNZvf4LOOZdV+jpjqAR+IakI+ANwH/BYHKwnr5mFgXlOP3RKpkNxzrlB1+vtumb2TTM7DTgTeAq4AHhc0m2SPiopb3PFlzftoHFnq/eP5JzLSUk60dsG/CY+iGMqvAP4GfD2tEY3THUOzOM9qjrnclHSTvRmEKqWOpevMbPvpi2qYa6mroGK0pHMmVia6VCcc27QJelE71vAOcByoD1ONuDhNMY1rNXWNVJdNd4H5nHO5aQkZwzvBQ42s939LpkHNjS3sLJhJx890W9sc87lpiR9Rb8CFKU7kGxRWxcG5qn2hmfnXI5KcsawE3hS0h/oeufzZ9IW1TBWU9dASdEIFkwfm+lQnHMuLZIUDIvjwwG19Q0cNWs8RT4wj3MuRyW5XPWmoQgkG2zf3cbytc1ccuqBmQ7FOefSpq8uMe40sw9KeoZwFVIX+TgewxMrG+kwb19wzuW2vs4YPhuf3zUUgWSDmrpGRgiOrvQb25xzuavXgsHM1sXnegBJY/taPh/U1jUwf/pYxhTn9dfgnMtx/bagSrpY0gbgaWBpfNSmO7DhprW9gydWbqXau9l2zuW4JIe+lwELzGxzuoMZzpavbWZXa7t3nOecy3lJrrl8mXAvQ17zjvOcc/kiyRnDl4C/SnqUPL7BrbaukdkTRjNlbEmmQ3HOubRKUjD8GPgj8AzQkd5whiczo7a+gVMOmpTpUJxzLu2SFAxtZva5tEcyjNVt2cnm7Xu8fcE5lxeStDE8KOkiSdMkTeh8pD2yYaSzfeFYb19wzuWBJGcMH4rPX0qZZsDcwQ9neKqta2D86CIOmDQm06E451zaJekrac5QBDKc1dY1ckzlBB+YxzmXF7yL0H5s3r6bVzbv8Gok51ze8IKhHz4wj3Mu3/RaMEg6KT4XD104w09tXQPFhSNYOMMH5nHO5Ye+zhiujs9/G4pAhqua+kaOmDWO4sKCTIfinHNDoq/G51ZJPwVmSLq6+8x8uPN55542nl3TxMVvypsLsJxzrs+C4V3A6cBphB5V886Tq7bS1mHevuCcyyt9jcewGbhD0nNm9tQQxjRs1NY1IsHRs/2KJOdc/khyVdIWSb+RtFHSBkm/kjQz7ZENAzV1DRw8pYzyUUWZDsU554ZMkoLhp8BiYDowA7g7Tstpbe0dPF7f6P0jOefyTpKCYbKZ/dTM2uJjEZDz3Yw+v34bO/a0+/gLzrm8k6Rg2CTpfEkF8XE+sCXdgWVa7d6O8/yMwTmXX5IUDBcAHwTWA+uAs+O0fkk6Q9ILklZIuryH+Z+TtFzS05L+IKlyIMGnU019IzPGjWL6uFGZDsU554ZUkk70VgLvGeiGJRUA1wBvBVYDNZIWm9nylMWeAKrNbKekfwS+DZwz0M8abGZGbV0DJ8ytyHQozjk35NLZV9JxwAoze8XM9gB3AGelLmBmD5pZ53jSjwDD4mqn1Y272NC82+9fcM7lpXQWDDOAVSnvV8dpvbkQuK+nGXGgoFpJtZs2bRrEEHvmA/M45/JZOguGngYvsB4XDA3a1cB3eppvZteZWbWZVU+alP4LomrqGikrKeSgyWVp/yznnBtu+i0YJJVL+n7nEbuk70oqT7Dt1cCslPczgbU9bP904F+B95jZ7qSBp1NtXQPVleMZMcIH5nHO5Z8kZww3As2EK5M+GF8nucGtBpgnaY6kkcC5hBvl9pJ0FPBjQqGwcSCBp0vjjj28tHG7ty845/JWkjGfDzCzv0t5/zVJT/a3kpm1SboEuB8oAG40s2clXQXUmtliQtXRGOAXcdjMlWY24CugBtPS+jAwj9+/4JzLV0kKhl2STjazP8PeAXx2Jdm4md0L3Ntt2hUpr08fQKxDoqa+gZEFIzh8ZpLaMuecyz1JCoZ/BG6K7QoCGoCPpzOoTKqta+SwmeWUFPnAPM65/JTkBrcngSMkjY3vm9MeVYa0tLbz9OqtXHDynEyH4pxzGdNrwSDpfDO7RdLnuk0HwMy+l+bYhtzTq5tobTeOrfT2Bedc/urrjKE0Pvd0MX+P9yNku84b246p9BvbnHP5q68R3H4cX/7ezP6SOi82QOec2roG5k0ew/jSkZkOxTnnMibJfQw/TDgtq3V0GLX1jX7/gnMu7/XVxnAi8AZgUrd2hrGE+xJyyosbt7Gtpc37R3LO5b2+2hhGEm4+K6RrO0MzYUyGnFJT5ze2Oecc9N3GsARYImmRmdUPYUwZUVvXwJSxxcwc7wPzOOfyW5Ib3HZK+g6wACjpnGhmp6UtqgyorQvtC52X4zrnXL5K0vh8K/A8MAf4GlBH6CAvZ6zZuos1W3dxrF+m6pxziQqGCjO7AWg1syVmdgFwQprjGlK18f4FvyLJOeeSVSW1xud1kt5JGFNhWAzBOVhq6xoZU1zIIVN9YB7nnEtSMHw9dqD3ecL9C2OBf05rVEOspq6Bo2aPo7AgnQPaOedcduizYJBUAMwzs3uAJuDUIYlqCDXtauWFDds487BpmQ7FOeeGhT4Pkc2sHcjowDnp9vjKRsyg2m9sc845IFlV0l8l/Qj4ObCjc6KZPZ62qIZQbV0DhSPEkbPGZToU55wbFpIUDG+Iz1elTDMgJ+5jqKlrZMGMckaPTPJVOOdc7ksyUE/OtSt02t3WzpOrtvLREyozHYpzzg0beX0ZzrI1Texp6/D7F5xzLkVeFwydHed5w7Nzzu2T1wVDbV0DcyeWMnFMcaZDcc65YSNRi6ukNwBVqcub2c/SFNOQ6ByY523zp2Q6FOecG1b6LRgk3QwcADwJtMfJBmR1wfDypu1s3dnq7QvOOddNkjOGamC+mVm6gxlKPjCPc871LEkbwzJgaroDGWq1dQ1MHDOSqorRmQ7FOeeGlSRnDBOB5ZIeA3Z3TjSzrO4qo6a+gepKH5jHOee6S1IwXJnuIIba+qYWVjXs4mMnVmU6FOecG3aS3Pm8ZCgCGUq19WFgHm9fcM651+q3jUHSCZJqJG2XtEdSu6TmoQguXWrrGhlVVMD86WMzHYpzzg07SRqffwScB7wEjAL+Pk7LWp0D8xT5wDzOOfcaiXJGM1sBFJhZu5n9FHhzWqNKo20trTy3rtnvX3DOuV4kaXzeKWkk8KSkbwPrgNL0hpU+T6zcSofBsd4/knPO9SjJGcNH4nKXEAbqmQX8XTqDSqfaugZGCI6a7QWDc871JMlVSfWSRgHTzOxrQxBTWtXUNTJ/+ljGFPvAPM4515MkVyW9m9BP0v/F90dKWpzuwNKhtb2DJ1Y1Ul3p7QvOOdebJFVJVwLHAVsBzOxJQk+rWefZtc20tHb4/QvOOdeHJAVDm5k1pT2SIVBbF25s84F5nHOud0kq2pdJ+hBQIGke8Bngr+kNKz1q6hqYPWE0U8aWZDoU55wbtpKcMXwaWEDoQO92oBm4NMnGJZ0h6QVJKyRd3sP8Ykk/j/MflVSVPPSBMTNq6xr9bME55/qR5KqkncC/xkdikgqAa4C3AquBGkmLzWx5ymIXAo1mdqCkc4FvAecM5HOSenXzDrbs2OPtC845149eC4b+rjxK0O32ccAKM3slbu8O4CwgtWA4i329t/4S+JEkpWNQoNq9A/P4GYNzzvWlrzOGE4FVhOqjR4GBDlwwI67faTVwfG/LmFmbpCagAticupCki4CLAGbPnj3AMIJxo4t42/wpHDBpzH6t75xz+aKvgmEqoRroPOBDwG+B283s2YTb7qkg6X4mkGQZzOw64DqA6urq/TqbeNuCqbxtQc4NROecc4Ou18bn2GHe/5nZx4ATgBXAQ5I+nXDbqwndZ3SaCaztbRlJhUA50JBw+84559Kgz8ZnScXAOwlnDVXA1cCvE267BpgnaQ6wBjiXcOaRajHwMeBvwNnAH9PRvuCccy65vhqfbwIWAvcBXzOzZQPZcGwzuAS4HygAbjSzZyVdBdSa2WLgBuBmSSsIZwrn7mc6nHPODRL1doAuqYPQmyp0rfcXYGaWkeHPqqurrba2NhMf7ZxzWUvSUjOrTrJsr2cMZubDmznnXB7yzN8551wXXjA455zrwgsG55xzXfTa+DxcSdoE1O/n6hPpdld1Hsi3NOdbesHTnC9eb5orzWxSkgWzrmB4PSTVJm2VzxX5luZ8Sy94mvPFUKbZq5Kcc8514QWDc865LvKtYLgu0wFkQL6lOd/SC57mfDFkac6rNgbnnHP9y7czBuecc/3wgsE551wXeVEwSDpD0guSVki6PNPxDBZJN0raKGlZyrQJkn4n6aX4PD5Ol6Sr43fwtKSjMxf5/pM0S9KDkp6T9Kykz8bpOZtuSSWSHpP0VEzz1+L0OZIejWn+uaSRcXpxfL8izq/KZPz7S1KBpCck3RPf53p66yQ9I+lJSbVxWkb265wvGCQVANcA7wDmA+dJmp/ZqAbNIuCMbtMuB/5gZvOAP8T3ENI/Lz4uAv5niGIcbG3A583sUMIAUv8Uf89cTvdu4DQzOwI4EjhD0gnAt4DvxzQ3AhfG5S8EGs3sQOD7cbls9FnguZT3uZ5egFPN7MiU+xUys1+bWU4/CGNX35/y/kvAlzId1yCmrwpYlvL+BWBafD0NeCG+/jFwXk/LZfMDuIswBG1epBsYDTxOGD99M1AYp+/dzwljoJwYXxfG5ZTp2AeYzpmEjPA04B5Cd/85m94Yex0wsdu0jOzXOX/GAMwAVqW8Xx2n5aopZrYOID5PjtNz7nuIVQZHAY+S4+mO1SpPAhuB3wEvA1vNrC0ukpquvWmO85uAiqGN+HX7AfAFoCO+ryC30wth3JsHJC2VdFGclpH9us+hPXOEepiWj9fo5tT3IGkM8CvgUjNrlnpKXli0h2lZl24zaweOlDQO+A1waE+LxeesTrOkdwEbzWyppDd3Tu5h0ZxIb4qTzGytpMnA7yQ938eyaU1zPpwxrAZmpbyfCazNUCxDYYOkaQDxeWOcnjPfg6QiQqFwq5l1jkGe8+kGMLOtwEOE9pVxkjoP7lLTtTfNcX45YejcbHES8B5JdcAdhOqkH5C76QXAzNbG542Ewv84MrRf50PBUAPMi1c0jCSMK704wzGl02LgY/H1xwh18J3TPxqvZjgBaOo8Rc0mCqcGNwDPmdn3UmblbLolTYpnCkgaBZxOaJR9EDg7LtY9zZ3fxdnAHy1WRGcDM/uSmc00syrC//WPZvZhcjS9AJJKJZV1vgbeBiwjU/t1phtchqhR50zgRUK97L9mOp5BTNftwDqglXAEcSGhbvUPwEvxeUJcVoSrs14GngGqMx3/fqb5ZMIp89PAk/FxZi6nGzgceCKmeRlwRZw+F3gMWAH8AiiO00vi+xVx/txMp+F1pP3NwD25nt6Ytqfi49nOfCpT+7V3ieGcc66LfKhKcs45NwBeMDjnnOvCCwbnnHNdeMHgnHOuCy8YnHPOdeEFgxtSkkzSd1PeXybpykHa9iJJZ/e/5Ov+nA/E3l0fTBqPpOsz0XmjpE9K+uhQf67LbvnQJYYbXnYD75f0DTPbnOlgOkkqsNDtRBIXAp8ysz4LhlRm9vf7F9nrY2bXZuJzXXbzMwY31NoIY9f+c/cZ3Y/4JW2Pz2+WtETSnZJelPRNSR+OYxQ8I+mAlM2cLulPcbl3xfULJH1HUk3su/7ilO0+KOk2wk1C3eM5L25/maRvxWlXEG6yu1bSd7otL0k/krRc0m/Z1+EZkh6SVN2ZLknfip2l/V7ScXH+K5LekyDmhyT9UtLzkm6Nd4MTv5flcfn/itOulHRZfH2kpEfi/N9oX9/+D8V4Hovf2xvj9AVx2pNxnXmJf2WX1fyMwWXCNcDTkr49gHWOIHQc1wC8AlxvZscpDNTzaeDSuFwV8CbgAOBBSQcCHyV0GXCspGLgL5IeiMsfByw0s1dTP0zSdEK//scQ+v5/QNJ7zewqSacBl5lZbbcY3wccDBwGTAGWAzf2kJZS4CEz+6Kk3wBfJ3QdPh+4idDdwYV9xHwUsIDQN85fgJMkLY+ff4iZWWcXGt38DPi0mS2RdBXw1ZTvrTB+n2fG6acDnwT+28xuVehOpqCHbboc5GcMbsiZWTMhk/rMAFarMbN1Zrab0A1AZyb5DKEw6HSnmXWY2UuEAuQQQr8zH1XotvpRQjcDnUe/j3UvFKJjCZn3JgtdOd8KnNJPjKcAt5tZu4UO0f7Yy3J7gP9LiX+JmbV2S0t/Ma82sw5ClyBVQDPQAlwv6f3AztQPlFQOjDOzJXHSTd3S09kZ4dKUGP4GfFnSF4FKM9vVT/pdjvCCwWXKDwhHxaUp09qI+2SsHhmZMm93yuuOlPcddD3z7d7HixH6lfm0hZGxjjSzOWbWWbDs6CW+Xvvx7keSPmZabV9fNHvTEjP6zrT0FXPqd9FOONpvI5z9/Ap4L/sKnqQ6t9neGYOZ3Qa8B9gF3B/PlFwe8ILBZYSZNQB3sm94RggjWB0TX58FFO3Hpj8gaURsd5hLGNnqfuAfFbrrRtJBCj1Y9uVR4E2SJioMD3sesKSfdR4Gzo3tA9OAU/cj/k4DillhfIpyM7uXUD10ZOp8M2sCGjvbD4CP0E96JM0FXjGzqwnVW4fvb2JcdvE2BpdJ3wUuSXn/E+AuSY8RepLs7Wi+Ly8QMrwpwCfNrEXS9YTqkcfjmcgmwlF1r8xsnaQvEbp6FnCvmd3V1zqEPvRPI1QJvUj/BUlfBhpzGeG7K4nxvqZxn9Bt87WSRhOq2T7RTwznAOdLagXWA1cNKAUua3nvqs4557rwqiTnnHNdeMHgnHOuCy8YnHPOdeEFg3POuS68YHDOOdeFFwzOOee68ILBOedcF/8fMEC5ohcEyFEAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "def run_d_n(dim,N_pts,L):\n",
    "    pts=np.random.rand(N_pts,dim)-0.5 # simulate N_pts points on dim dimensions space\n",
    "    ratio_list=[]\n",
    "    for i in range(N_pts):\n",
    "        # ignore the data point itself\n",
    "        selected_pts=np.array([j for j in range(N_pts) if j!=i])\n",
    "        # calculate the L2 or L1 distance with other points\n",
    "        dist=np.linalg.norm(pts[selected_pts]-pts[i],L,axis=1)\n",
    "        #print(\"dist is: \",dist)\n",
    "        # calculate the ratio of the min. distance to the max. distance\n",
    "        ratio=np.min(dist)/np.max(dist)\n",
    "        ratio_list.append(ratio)\n",
    "    # output the mean ratio\n",
    "    return np.mean(ratio_list)\n",
    "\n",
    "# Initialise the N_pts, the number of points we simulate\n",
    "N_pts=1000\n",
    "# Setting l=2 to calculate the L2 distance\n",
    "l=1\n",
    "# Setting the number of dimensions we simulate\n",
    "check_dim=range(1,550,50)\n",
    "# Calculate the mean ratio on that dimension\n",
    "ratio_list=[ run_d_n(dim,N_pts,l) for dim in check_dim]\n",
    "# Plot the ratio with its corresponding dimension\n",
    "plt.plot(check_dim,ratio_list)\n",
    "plt.ylabel(\"Mean ratio of min/max pairwise distances\")\n",
    "plt.xlabel(\"Number of dimensions\")\n",
    "plt.title(\"Effect of increasing dimensionality on pairwise distances\")\n",
    "plt.xticks(np.arange(0, 600, step=100))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Question:** how can this plot be interpreted ? How else could you visualize this effect ?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "#1.The average range of pairwise distances increase as the number of dimensions increases\n",
    "#2."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2) Implement Nearest Neighbour from scratch\n",
    "\n",
    "The following will give some practise in implementing a simple classifier, the $k$-Nearest Neighbour ($k$NN) algorithm. It should help us to write a $k$NN package from scratch. Most machine learning methods include two main steps, namely training (fitting to a model to the training data) and prediction (running the model on input data  to generate output). However, in the $k$NN algorithm, since there is no explicit model-building step, we only require implementation of the prediction step without a training step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Creation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(500, 2) (500, 2)\n"
     ]
    }
   ],
   "source": [
    "mean_01 = np.array([1, 0.5])\n",
    "cov_01 = np.array([[1, 0.1], [0.1, 1.2]])\n",
    "\n",
    "mean_02 = np.array([4, 5])\n",
    "cov_02 = np.array([[1, 0.1], [0.1, 1.2]])\n",
    "\n",
    "dist_01 = np.random.multivariate_normal(mean_01, cov_01, 500)\n",
    "dist_02 = np.random.multivariate_normal(mean_02, cov_02, 500)\n",
    "print(dist_01.shape, dist_02.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have created two 2-dimensional normal distributions of data points with the same covariance but different means."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plotting the created Data "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What does the data look like ? Notice the 2 unique clusters being formed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXkAAAD8CAYAAACSCdTiAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzt3X+UXFWVL/Dv7ko1VFrtTj+imE4i0WGFp5AQ04uJxjejRAgMCC1IQNGF4pjl0lEBJxKEgeDyPYPxgfgcZ1YG8McCNTHEBhEmQXDeLJgJ0CG/RMygBpM0IO0LaSUp09Xd+/1x61bfunV/31t1q259P2tp0pWqWyfdYd9T++yzj6gqiIgomzrSHgAREdUPgzwRUYYxyBMRZRiDPBFRhjHIExFlGIM8EVGGBQ7yInKXiLwsIr+wPNYrIg+LyHPlX2fUZ5hERBRFmJn8dwCcY3tsNYBHVPVkAI+UvyYioiYhYTZDichJAB5Q1VPLX+8F8G5VfVFE3gjg31R1fj0GSkRE4U2L+fo3qOqLAFAO9K93e6KIrASwEgC6uroWn3LKKTHfmoiovWzfvv0PqjozzGviBvnAVHU9gPUA0N/fr0NDQ416ayKiTBCR34V9Tdzqmt+X0zQo//pyzOsREVGC4gb5+wFcUf79FQDui3k9IiJKUJgSyh8A+E8A80XkoIh8HMBaAGeJyHMAzip/TURETSJwTl5VP+jyR8sSGgsRESWMO16JiDKMQZ6IKMMY5ImIMoxBnogowxjkiYgyjEGeiCjDGOSJiDKMQZ6IKMMY5ImIMoxBnogowxjkiYgyjEGeiCjDGOSJWsHujcBtpwJreoxfd29Me0TUIhp2MhQRRbR7I/CTzwKlovH16AHjawBYsCK9cVFL4EyeqNk98qWpAG8qFY3HiXwwyBM1u9GD4R4nsmCQJ2p23bPDPU5kwSBP1OyW3QjkC9WP5QvG40Q+GOSJmt2CFcD7vgF0zwEgxq/v+wYXXSkQVtcQtYIFKxjUKZJEZvIicrWIPCMivxCRH4jI8Ulcl4iI4okd5EWkD8BnAfSr6qkAcgAui3tdoqbATUjU4pJK10wDUBCREoDpAF5I6LpE6eEmJMqA2DN5VR0G8DUA+wG8CGBUVbfanyciK0VkSESGRkZG4r4tUf1xExJlQBLpmhkALgQwD8AsAF0i8mH781R1var2q2r/zJkz474tUf1xE5KBKauWlsTC63sB7FPVEVUtAdgM4J0JXJcoXX6bkNoh+Jkpq9EDAHQqZZXFv2tGJRHk9wNYIiLTRUQALAPwbALXJUqX1yakJIJfK9wkmLJqeUnk5J8AsAnA0wD2lK+5Pu51iVLntQkpbvBrlRkyU1YtL5HqGlW9CcBNSVyLqKm4bUKKG/y8bhLNVLnTPbt8I3J4nFoC2xoQRRG3aVjYm0RaqR32zWl5DPJEUcQNfmFuEk6pnc0rgTXd9Q/47JvT8kRVG/6m/f39OjQ01PD3JUrU7o1GemX0oBGcl90YPPjZN1oBxk3CKYDedqpzysSq0Auce0u6wTfO94MCEZHtqtof6jUM8kQpCRoU1/QACPLfqQD9VwLn35r0SP253bQWfgh4bisDf0IY5Kl9NcMssl5jCDKTt8p3AaWj8cfwwDXA9u8AOgFIDlj8UfcbiOsYBVU3KLdPKxRIlCDPnDy1vmYoR6zXGHZvBMaOhHtN6Yj/GPwWch+4Bhi60wjwgPHr0J3G405cq4psk0jW2Dccgzy1vmbYsJPkGCoBuNtYYC0eij4upzEEuSFt/47z9dweD1NSyRr7hmKQp9bXDBt2go7BbwZdFYCBYLl4v7EdqH7vzZ/wvyGZM3g7t8edqo0gzs9ljX1DMchT62uGg66DjCHIDNrpE0FcknO4eTiw3pAk534tJ06llv1Xssa+CTDIU+trhg07QcYQJKVTj08fOuE8e7ez3pAWf9T5OW6PA0agv/oXwJrDxq/n38oa+ybAM16p9ZlBw6+ypZ4VOE5jOPls4+vNK93bAwDVgd3refVkvyGZVTRBq2vc8Gza1LGEktqDUx13Rx447rVA8ZXkg77T+9nLCU3dc4yZ7+6NwEPXxltojaJ7DtD7ZuD5x+IFdKq7KCWUnMlTNtln7WNHatMVk6WpgBr2aD+/TwUPXeuQHnEI8B15Y8Z/y7zGB3ezZn3/NqM80mSWSwIM9BnAmTxlj+MsOiBzVh32+tZNPrs3GjnwIKQD0Mnw44yr0Gv8WnwFrhU8kgNuavCNhzxxMxQREK9CJcjCp9sC6uZPGCWKD10b/P0aHeA78saO2OKh8icHj0meTjTvYSYUGNM1lD1xKlQKM+JdP41FUy/5LmDaccaMvTADGHu1vCM2oNEDwH2fNn7PBdSWxJk8NbcofdTdatYLvVPlfG4bdYKMQyL+ZxP1dXGUjhgzdukAxo8BE2PhrzExFu7TCTUVBnlqXlH7wbjVrJ97y1Qdt5viodqbiX0cbrs+/aSw/jX13hPhZvB21kXhuAeYtMLZthnChVdqXm6dDYMujprVL2YKxkxZAAEqWcpte+cuAX78yeiBvR7M3vFBF3eTsmbUvTQ0aIvjMH30qQZbDVO2uPZRF+/ZuFWcShsAyHV6pzg68oBItDRIGE43tjXd9X1Pq0IvcO0+75bCF633D9RxbtyUXnWNiPSIyCYR+ZWIPCsi70jiutTmkuhJE7cXjF/wniwBna+ZKkmsl7EjtWmNer9nRYfxyQHwbikcpONmMzSTazNJ5eRvB/CvqnoKgIUAnk3outTOkuhJ04jgUSlHrPN7WNcjGpnHzlmK8LxusKMH/MfVDM3k2kzsIC8irwPwVwDuBABVHVPVgJ+liTwkcYh0kOBR6EXoaps0mLX4SfSZD8NaXbPsRnh+r/wWxpuhmVybSWIm/2YAIwC+LSI7ROQOEemyP0lEVorIkIgMjYyMJPC21BbsnQ3DLs459jm3MKtu+q9ETfDqyIcebuM0eC2teMgI3gtWOH+vTH4HpSRx46ZQYi+8ikg/gG0AlqrqEyJyO4A/quo/uL2GC6/UUG6VNvaeM079aBpdwdLMrIujfq0buufw8O46SKW6RkROBLBNVU8qf/0/AKxW1fPcXsMgTy0j7CHamSdTgfuRL/Hw7gZLpbpGVV8CcEBE5pcfWgbgl3GvS5SoqBtwTj67vuNqOZZNaSef7XLkn8Ph3dwxm5qketd8BsA9ItIJ4LcAPpbQdandxT3ow6lHe9C2wrs3Ak9/L9q4W1lHHsAkMOmxAaxUBJ7baszQrT8ft089Zk4f8P551vNglzbFzVDUvOLujvTbCOW3AacdUzWFXuBt7zdubpMlnyc7bErz+p51dhldN71aNHM3rCe2GqZsCXImatjXW/nV0DdbgO/sKlel1FHxkHHkn2+Ah3N5qlcppNPBLdafZ9yfNzlikKf0+OXJXXdHHgiWV/cL4n419JLz/vNGGzsCHG1AbXyQPj1ute1RZtzmz4m7YeuCQZ7SEaTDpN/uSr+NN16v99uAs3tjczUlM8XpJBmX5BCott213YJLbb35c+Ju2LpgkKd0BPlo7reRye+jvNvrC73eQeqBa4wdpVTt/f8cbFPaubfUbiTryBubqLx2u3I3bF3wZChKR5CP5mYgca3H9rhOzesDVmvs3ggM3YWG7yhtBUFTMV7f97lL3H8eUX5e5IvVNZSOsC1ngzw/ifK7dqyoCYKHejcFVtdQ6wj70dzv+U45/sFPAbfMC7cBql0X+Qq9xnmwbhZ/lCc6tSgGeUpH2EZVfs93yvFPlsqboEIcHdiui3zjx4wDv530f9xIs0Q5ipFSx3QNNb8gaRjXU6RsJGcsIHotug7dGXvILS9fABZ+yNjVOnrQOAjcqdqIJzo1VJR0DRdeqbnZd0G6tSTw2lJvpRPAfZ+ufb3pua3er5cOY9dm1pWK1Tc7t3LSoOkttitIDdM11NyC7oL0K7e0sh6CYc8z+90o2iHAhxEkvRVkTwTVDYM8NbeguyDtOXu/k57Mhln24NMKJ0Q1i6A17GxXkCoGeWpuYXZBWk+RCsKxt42iNQN9g8csHcEbhyXdroBVPqEwyFNzi7oLMkgawTU104oboSKMOdcZ/XxbnTRukkECbZLtCpj6CY1BnpqbU+nkwg/5B5hAOXq3Xipz6t/tsRlMjBmdLdccjvb3DRpok2xXwNRPaKyuofQErbhYsKL6HNYg1Tbm7+0HhlRxmP1ag087nO86etD4no7FbHxmBlq3nx+QTHUNO1WGxjp5SkfUAyLCtkMw38srYBd6nQ/2/p+z0u362AiFXmC86N13PzCHQ0SSFuXnnyFsa0CtI+rH7igzuQUrvNMRZsrC3l2xdNR7LFkwcSyhAI/G7BZmp8rQGOQpHVE/dkddxPMKAqMHnPPJ7dDiIEqapiNX20q4UYE2bDsMYpCnlMQJ1lFmcgtWeBxmAeeFQ84OnR3XDQx8K71Aay2V9ettT8kFeRHJicgOEXkgqWtShp18NmqqW4IG66gzuXNvca+4KRXx0uYvYnDHcJDRt5Z8l/tRhoVe55vmvL92v17xEFsUtJAkq2s+B+BZAK9L8JqURbs3Aru+j+rqFjFKI4MEC2u1TRjma1wWYd+gI3jsx98C8CkM5B6fqtppdaUjRifJXd+vXeg+9xbj9+bBLJIznvP8Yx4XlKnFT7fqJmoaiczkRWQ2gPMA3JHE9Sjj3Haa+jUHS4LHIqwIsE6+iQvve6txI0hqQbIZPLfV/RPQghVTaTCzEZnn+ba2ijzWqTe1pGbyXwfwBQCvdXuCiKwEsBIA5s6dm9DbUkuqd62zX/39shtryzfLJFJ3AAEKMzzq8ZvA6AHje+KWWnG88Ya5PuvUm1XsIC8i5wN4WVW3i8i73Z6nqusBrAeMOvm470utY3DHMNZt2YsXDhcxq6eAhwsnYnrxxdonJlHN4rVZCpgK/oUZ0FIxuY4v597ieuNoGqMHjE8oD11rjNca7OMG6XaoRGpRSaRrlgK4QESeB/BDAGeKyN0JXJcyYHDHMK7bvAfDh4tQAMOHi7jxyMUYzx1f/cSkSvDc6u8fura650mSs+7u2UbAXPih5K5ZT8VDtdVEoYJ0hAVzSk3sIK+q16nqbFU9CcBlAB5V1Q/HHhllwrote1EsVed3N429E1+WT9anBM9tRlo8VBP8BUm0IpOpANeINYWkmDc+s5tjqHp5ZZ16C2HvGqqrFw47py++++oZWHPDzcm/YdATosqk8v8Rw33/lUaA270x2Ps208lSxUNTn2iKh4xNTpNeC65lQVsI8DSoppDoZihV/TdVPT/Ja1Jrm9XjXJfu9nhsTpulcp0+L1KE/09BjLLE82+dWgcIouEBPsSqw+SE0eLB6zVBUzNsCdw0uOOV6mrV8vko5Ks34hTyOaxaPr8+b+i0WarzNQFeGCL4Sg64aL0R4IFglSlum5HqTr13+tqNHYHrp5owqRm2BG4aDPJUVwOL+vCVi05DX08BAqCvp4CvXHQaBhb11e9N7dvei68ke32dxODEUixd+yjmrf4pJr0qUy76F2DNqMcMXoznBD2fNqzuOcC1+4z3sN74wgR+wHhdmBYCbAncNJiTp7obWNRXE9TtZZWrls+vX+APmaf3c7RwIq7bvKeyoPzC5H/D7I4/OLzvnKmg6DYGszIHAH78SZ9NSH5sawvW1Iq56cnMkxcPOT9/WsG58ihsiaTX35caijN5ajinssrrNu+pX9+YQKdEBZQv4KulS6sqhr46vgJHtbPmeVW5a7cx/HEYWNNtBN7FH405OFvVi/0ErQeuseTJy8838+9mKsapv0+UEkm2BG4anMlTwzmVVRZLE1i3ZW99ZvNOJxMdPeR8IIiZxjBns51dQO64qkNFvvv9LgDABR2PYU3+e5iBVwEAEwrkBEbAtFaSmLPnUrG2usb8/eiBcj+fIFyqgaxVL06bwobucnhdOWdvr5aJWxWT5GlQFAuDPDWcW1ml2+OJsKcrSkdQGywFOPE04OCTUw+NHQHyk8ZCazlAzXrwUSz+48P4Wn49OmW88tQcYPRZtwd4a7D1qq4pFY0FWq+UjeufS/Us2a0/kJPiIWOc5pijNoCzS+o6FAvTNdRwSZVVDu4Yrix+Ll37qH+6p6qsD3Cc1e77d++qkN0b8bB8Crfnv1UV4CsmS9UVJGF7wuhE7YEcpo68xw1A47UpYNVLZjHIU8MlUVbpl9d3vAEECrgus13zwOuffBbTiy96NzIzn+t2HqmX7jnGgRz26pdCr+WgDgeSi9GmAKx6yTCma6hKI6pezOvFeR+vvD6AquoX8wZwYe5g9IZk3bODz8oLM6I1KzMXJv3SHE7X1onqvu6OnTY9dvay6iWzGOSpwpwd24MjgLoE+iDXdLvpeOX13W4Av8+dgBMxEmB0LqWIm1f6v9RMtXgF+FynsUGreGgqx25frLWytwdY+CFg+3dqUzdmWsl6k7C+zutTBateMotBnioaXvXiw+umM6ungGGHQD+rp+B6A/jK2CW4vevb3gE4XzCC6HNba6tCzNOT3BR6jRJEr5uB+ZygC5JOVTK7vu+emzdTRdbgbi4au6WPCr1cIM0w5uSpIpWqFw9eNx2vvL7bAu7Q684yasGt+e58V/lrS0fF8291PijarS+O+fpOo7TSM/XR2RUuoLq1B3Brk5CfDtz36eqeMfd92gj8brXr5hGAlEmcyVOF1+w4qjg5fq+bjj2v313IQwS4esNO9EzPI98hKE1OpVwExieBNfc/gxv06NQ//NIRANUlkq7sKZDCDODYn6Zq6s1DOfJd7tdwmml71Y+7LYiaVTiTperHnWr/J8aMtsLX7qseP2vX24KoNv6Qpv7+fh0aGmr4+5I3e3oEMGbH9l4zQQN30Ou5jeWajTsx6fLP88NL5uLLA6e5vg8wlVm3Ztgf6/ysewuCIO1zraJUzxR6gfFi7YHabo2/3N6je45Rwx/m8JM1o+HGSk1HRLaran+Y1zBdQxVBmomFaUngVwHjZnDHMFZt2uUa4AHg7m37ccPgHtf3AaYCu/Uys8QhwAPRSgh9X+NwghIQrjujWzuEsAGe2haDPFUZWNRXyWublSrWAB4mcEfN8a/bshelCf9PmPds2x/oelXvrSc4/0GUEkLf1zicoOTWEdPthmG2TrbXzVcajAUUuuskZQVz8lTFqaLlqg07cdWGnegp5HG4WHJ8nTXQmukctzA9q6fgmfIJGrS1/F5uawlOvjq+Amvzd2C6jE09GLVxlmMtuoVTCsitQsfrhmFW9tTM3O3JKJR3xU5WV9905Lm42sY4k6cqbqkPAK4BHgA6RDC4Y9hItfxol2vQLeRzeM8pM2tSPldv2FlJv4RZ6DUrbfK5YLPa+yffhesnPoEXcAImVfASZuKp024GFqxw3iVr7lw1Ozlad5W6zbIB9xtH1O6Mrqkh26eFgW8B7//n2se4uNq2uPBKVeat/mnkw60FwLQOoOTSg6uvPGNft2Wv401AANx26ekAgFWbdgVK2QiAfWvPw1v/4SEcdXlj61zXac+nAHjnW3rx9P7RqhvcBzr/A2vzd2DaxJ+nnuy2SBqmYibK2adeC7BhF4ypZUVZeGW6hqqESX3YKdwDPADPAG++/qoNO5ETwUTAyYf5CcItwJvXdfq99bHHf1O7iHkVflgd4IHqXaVWfq0I4h5q7ZQaYn92CiB2kBeROQC+B+BEGAdlrlfV2+NelxrDnht/zykzseHJA1U15klxKnN0EjTAm89d9aNdcYblKmwljus6g9OuVWufmSDYn50iip2uEZE3Anijqj4tIq8FsB3AgKr+0u01TNc0B6f68nyHYBLARMJBXgRIITMYi1tN/UuYiW0X/l/H0lLr91IAXL5kLr6874NMtVAiUqmTV9UXVfXp8u//BOBZAI1vdEKhOS2yliY18QAPtEaAty/dfh2XYTx3fNVjR7UT/2vskqqFYsD5e6kwyjyVh1pTihKtrhGRkwAsAvCEw5+tFJEhERkaGQnSCZDqLU5Pmsgte5tUvkMwvXOqH0xPIY93vf9TmHbh/8FLmIlJFRycPAGrS3+L+yffVQng5h4Ct++lAvg9EqzNJwopsSAvIq8BcC+Aq1T1j/Y/V9X1qtqvqv0zZ85M6m3JJsxpSVF70vT1FCJX4DSLrs5c5UYlYuT2j4xNzcSPjZcXcheswDv+fDvefOwevGvsG7h/8l2V5yhQ2QTm9b38ytglPNSaUpNIkBeRPIwAf4+qbk7imhSeU8uBVZt24fSbtzoGfadOjvkO7zm6AHh89Znoi9G0rBmMTUxOtT1Q1LRQsO7i9Qrg5gx+1fL5rp9uKt0v7btfuWhKDRA7yIuIALgTwLOqemv8IVFUjjn2CcXhYslx05FTr5p1lyxEIe/+z0IBLPrSVhw6csz1Oa2QyglSgz8cIICbN4CBRX1451tqN0XlO8Q41nDBCuf2xUR1lsRMfimAjwA4U0R2lv/3Nwlcl0IKkmO355IHFvXh8dVnYt/a8/D46jMBAOM+C6+vHC2h6FKXXsjncPmSuZUbh88Hg7rI5wQfXjI39nUExqejgUV9uHzJ3JpAbz2XdnDHMJ7cV9uXxmPbAFFDJFFd85iqiqouUNXTy/97MInBUThBc+zWXLLV4I5hfH5jsJ2mbgRaaRx2+ZK5rkeK1lNpQnF3eQxxWL9PXx44Dbdderprh851W/Y67i2YmFTfrptE9cQdrxmyavn8wBuO7LN+M58fZiOSE3Pn6fDhYiKBNm3W75PXubRen6LSOlmLCGCQzxT7aUk90/N45ahzUzH7rN+rMVk7C/rpyKsdRNBrxDlFi8gNg3zG2GebNwzuMTbkWJ5jzSWbONusVVk0DWDV8vlY9aNdNSmbfC7YNbwOLWegpzjYajjj/HLJpp7p+XQG2MRec/w0zwBr3ZOwbsteXHrGHPQUpr6PM6bnse4DCwMF6ainaBH54Uy+DXjlkgd3DOPmnzzjmtZpZ4c9vidOM+97tw8HOr/WSdRTtIj8MMi3MadUTjuY1iG+ZaIA0F3Iu+bJvWbefrN/p+u55fSj7komMjHIZ5jXQt4Ng3syUf1i53QoiF2QAA8Afzo2XpVnt+bJo8y8vfLuTpVRTmsnRGHxZKiMcmt9qzByxUHTM2EO8EhbTgQf/Ms52PDUgdC1/kFuDiazpYPTzLuvp1DZVGa3dO2jnq9hdQ354clQVOHW+hZAqPz7hGqoAJim/71iIQBgw5MOvdt9hPn7DR8uoqeQRz4nVTcTv5m33+zfa+2EKCpW12RUkgt2rRDgzfYJbjtPk3a4WALU+FTkVbVk5ZZfZ96d6okz+YyKc1ZrK5pU43zYRipNKqZ3TsOOG88O9Hzm3SkNnMln1Krl833bBtt1wJiZUnB+C63W3v4Aarp+XrzYqNQJ0v+fKArO5LMsRIzvKeSx5oK3AYDjzk1y5pZqsZenmpU0X7notMrCLHe5UiMwyDexONUW67bsDVRh4lTZIa3QEL4JuKVaBncMO+4/sNfRR621JwqDQT5FXkE87iwv6MLrC4eLNe/VIhWTDZfvACbVKCnNieDixc7VMOu27HVdrLb+XLjLlRqBOfmUOB3Vd93mPZWcbNxeJmG6J7IDZTClSVT2DEyo4t7tw445dK8gbf25sNqGGoFBPiV+QTzuLM/p/FY7M93AmaOhp5APtfDsdtN1C9ICVKV3nH5GrLahpDHIp8QviLsFCq9ukfauiG+f241cOcGeE8HSt/Q6dqPkzNFQmpjEq38eD/Uap5+jU/AWGCdlWdM7A4v6cPHivqqfkVsKiCgq5uRT4teQatXy+Vi1qfYovlf/PF45d9Sa0+8u5HFkbLzy/OHDxarrT6ji6f2jjht23N6r3RwZC5+ycrpB2g9vcVs0H9wxjHu3D9ekgPrf1MtAT4lhkE+J38aYgUV9WHP/M8bOSouS5cxQ6+vtz3Nir9wwbxLttGkqSV6plSAtClhdQ42QSJAXkXMA3A4gB+AOVV2bxHWzxl5Nc/HiPvz8VyOus71Rl8D9wuFi5MVSM73g1MAs6wr5DhTLZ9DGJUDk3vEmVtdQI8QO8iKSA/CPAM4CcBDAUyJyv6r+Mu61syTKIRNeKZ2ogcBML7RTRc3St/Ti+f9XTPQTSxKJLfaQp0ZIYuH1DAC/VtXfquoYgB8CuDCB62ZKlJJIr+qLKIHAml5op9nif/zmUF1SUtaSVzf21gbW57O6hhohiSDfB8Da2/Vg+bEqIrJSRIZEZGhkZCSBt20tUT6aDyzqq+l1Ys78nQJEvkPg1q4mJ1L1qaGdZov1Wk4uliZw1Yadrj1n/PZCeP18iZIS+9AQEbkEwHJV/dvy1x8BcIaqfsbtNe14aIjfgRFR3DC4Bz944kBlB+YH/3IOANRsqS/kczXBox1z8vWUzwm6OqdhtFiqrK+4LWrH+ZlTe4tyaEgSM/mDAOZYvp4N4IUErpspSX80dyq/2/DkAWx46kBVgBfAsfbaOouk+EoTisPFUtWM3S1F1E6pMkpfEkH+KQAni8g8EekEcBmA+xO4bqYk/dHcKcdfmtSaWncF8PNfOafHBhb14fHVZ4ZpVkkBFUsTlU1Odu2UKqP0xa6uUdVxEfk7AFtglFDeparPxB5ZBiV5vFuY2aDfc7sL+UB19lZ9PQUcPjoWaQNRu5hQRSGf4yEhlKpE6uRV9UEADyZxrXYWprVwmJOfnGaOcVsLr1o+H9dsbOxJTK2mz5Kb5+HclBbueG0SYVsLO+2YzXcIIPA9XDpua+EZ0/O4+SfPgOeKTMl3SNVBK+b3nYdzU9rYoKxJhK2jd8rxr7tkIdZ9YKFv3j/uRqgjx8bxytFw6Z0sM7/3LIWkZsSZfJOIWkdv70NjpgVuu/R01yATt7pjLCONzATeNfRmugVApRzS/hrO2KnZMcg3iThb3MOmeoLk8ztzgq7jprX0jN0viHv9mTV4A3C9mTLHTs2O6ZomEaeOPmyqJ8iBIhOTwE3ve5vvezczhfdZ5m4ljoD7988sO9239jw8vvpMBnhqegzyTSJOHX3YVE+QjVATqhhY1IeeQrCTknIigZ8LAF2dOfQU8pW/a70OD3ebredzxg5hr5sdNy1RFjBd00SC5HWd0gVRUj0LnRmwAAAKhElEQVTme73lugcru2atzKC75oK3+bY/MNsmAAjcKuHI2AT6ejqx5oK3YWBRHy7/l//E47855Pu6qHosewFmTM/jpvcZ79v/pl58fuMux+8BNy1RFjDItxC33PvFi/tw7/bh0JtuBncM47hpgqMlhyBf/nOnU47ec8pMzz74QQ8iMcc/9LtDePL5V4J8CyLx6hVjjtvrABeiVha7QVkU7digLAleTc6CbLqxfgromZ7Hq38er6rtdrpunEZaNwzuwd3b9vs+LyfiOJNOynHTOlDI56qah3l9b7igSs0qSoMyzuRbiFfu3S/VY/8UEKRqJm5O+qe7Xwz0vHoGeAA4Nj6JY+PGiVBulUcsgaSsYpBvIXHKLKNsgIqSk7bOiIOG7nrP5O14jiq1Ewb5FuJ3+LeXsLPyQj6H95wyE0vXPooXDhfRXchDBDh81DvlEbZHfSGfw8WL+7DhqQM1HTTriZUz1C4Y5FuI0yJo0NxxkA1Q5uahvvLiqnUx19ql0inlMbhj2LVKxcvFi/vw5YHT0P+mXtz8k2catvmKlTPULhjkW0zU3PGq5fNx9YadrimUPtsNY+naRz1n5MXSRFUXyus274mUcjF73Vv/XvNW/9Q31dNXPsx8Vk8BR8fC9dIRgJUz1DYY5NvEwKI+XLXBuTWwADVVNEHSGZMKrNq0C12d0yI3PLO/z+COYXT45OjtVT9OaaJ8h6BzWkdNv3sBcPmSuczHU9tgkG8jfSEWboP2qzePvYvK+t5msPYK8E5rEF5pLJZGUrtjkG8jYRZunZ4b1Yzpecd0ij1t4lcBZE8pWbmlsVgaSe2OQb6NhFm4NR8LspjaIXA9QMQMzE43jOPzHbh6w06s27IXq5bPd00RCYB9a8+reZyzdCJ/DPJtJszM1nzeqh/t8twZO6nOgd7erte+27ZYqt6g1OMy43c7vjBMe2WidsUulORpYFEf1l2y0LfD5KQaTcDcumhaW/RO75xWc9MoliagipqukAIjgC9d+ygGdwxXHg/bXpmoXcWayYvIOgDvAzAG4DcAPqaqh5MYGDUP++zfrcRxtFjCzpvOdrxGkJ2wo8USbrv0dMdTmOwz9SgnaRG1o7gz+YcBnKqqCwD8F4Dr4g+J0jC4YxhL1z6Keat/WjNrtnPbSOT2uJlaGfZpdTCrp1CZ8ff1FGqea52phx0DUbuKFeRVdauqjpe/3AZgdvwhUaPZg7A5a3YL9GFPsQrSN8f+er+ZepyTtIjaSZI5+SsBPOT2hyKyUkSGRGRoZGQkwbeluMLmt8OeYuWVQnF7vd9MPc5JWkTtxLefvIj8DMCJDn90vareV37O9QD6AVykARrUs598+oLkyN1KF8Py6oPv1q/eaRereQIVAzm1q7r0k1fV9/q86RUAzgewLEiAp/QF7RaZVH47SvfMOM3YiGhK3OqacwBcC+CvVfVoMkOieouSI48jasDmblWi+OJuhvomgOMAPCzGyc/bVPWTsUdFdeWXI6/HrJkBmygdsYK8qv5FUgOhxnFrPhb3TFciaj7c8dqGWH5I1D7Yu6YNcVGTqH0wyLcp5siJ2gPTNUREGcYgT0SUYQzyREQZxiBPRJRhDPJERBnGIE9ElGEM8kREGcYgT0SUYQzyREQZxh2vVMV6mAjbHRC1PgZ5qrAfJmKe9QqAgZ6oRTFdQxVhz3oloubHIE8VboeJeB0yQkTNjUGeKtzOdE3qrFciajwGeargYSJE2cOFV6rgYSJE2cMgT1V4mAhRtiSSrhGRvxcRFZETkrgeERElI3aQF5E5AM4CsD/+cIiIKElJzORvA/AFAJrAtYiIKEGxgryIXABgWFV3BXjuShEZEpGhkZGROG9LREQB+S68isjPAJzo8EfXA/gigLODvJGqrgewHgD6+/s56yciagDfIK+q73V6XEROAzAPwC4RAYDZAJ4WkTNU9aVER0lERJFELqFU1T0AXm9+LSLPA+hX1T8kMC4iIkoAd7wSEWVYYpuhVPWkpK5FRETJ4EyeiCjDGOSJiDKMQZ6IKMMY5ImIMoxBnogowxjkiYgyjEGeiCjDGOSJiDKMQZ6IKMNEtfENIUVkBMDv6nDpEwC0Wu8cjrn+Wm28QOuNudXGC7TmmOer6mvDvCCVM15VdWY9risiQ6raX49r1wvHXH+tNl6g9cbcauMFWnfMYV/DdA0RUYYxyBMRZVjWgvz6tAcQAcdcf602XqD1xtxq4wXaZMypLLwSEVFjZG0mT0REFgzyREQZltkgLyJ/LyIqIiekPRY/IrJORH4lIrtF5Mci0pP2mJyIyDkisldEfi0iq9Mejx8RmSMiPxeRZ0XkGRH5XNpjCkJEciKyQ0QeSHssQYhIj4hsKv8bflZE3pH2mPyIyNXlfxO/EJEfiMjxaY/JTkTuEpGXReQXlsd6ReRhEXmu/OsMv+tkMsiLyBwAZwHYn/ZYAnoYwKmqugDAfwG4LuXx1BCRHIB/BHAugLcC+KCIvDXdUfkaB/B5Vf3vAJYA+HQLjBkAPgfg2bQHEcLtAP5VVU8BsBBNPnYR6QPwWQD9qnoqgByAy9IdlaPvADjH9thqAI+o6skAHil/7SmTQR7AbQC+AKAlVpVVdauqjpe/3AZgdprjcXEGgF+r6m9VdQzADwFcmPKYPKnqi6r6dPn3f4IRfPrSHZU3EZkN4DwAd6Q9liBE5HUA/grAnQCgqmOqejjdUQUyDUBBRKYBmA7ghZTHU0NV/x3AIdvDFwL4bvn33wUw4HedzAV5EbkAwLCq7kp7LBFdCeChtAfhoA/AAcvXB9HkAdNKRE4CsAjAE+mOxNfXYUxQJtMeSEBvBjAC4NvlFNMdItKV9qC8qOowgK/B+KT/IoBRVd2a7qgCe4OqvggYkxgAr/d7QUsGeRH5WTmXZv/fhQCuB3Bj2mO08xmz+ZzrYaQY7klvpK7E4bGW+KQkIq8BcC+Aq1T1j2mPx42InA/gZVXdnvZYQpgG4O0A/klVFwE4ggAphDSV89gXApgHYBaALhH5cLqjqp9UetfEparvdXpcRE6D8YPbJSKAkfZ4WkTOUNWXGjjEGm5jNonIFQDOB7BMm3PzwkEAcyxfz0YTfsS1E5E8jAB/j6puTns8PpYCuEBE/gbA8QBeJyJ3q2ozB6CDAA6qqvkJaROaPMgDeC+Afao6AgAishnAOwHcneqogvm9iLxRVV8UkTcCeNnvBS05k3ejqntU9fWqepKqngTjH+Db0w7wfkTkHADXArhAVY+mPR4XTwE4WUTmiUgnjIWq+1Mekycx7vR3AnhWVW9Nezx+VPU6VZ1d/rd7GYBHmzzAo/zf1gERmV9+aBmAX6Y4pCD2A1giItPL/0aWockXiy3uB3BF+fdXALjP7wUtOZPPoG8COA7Aw+VPINtU9ZPpDqmaqo6LyN8B2AKjGuEuVX0m5WH5WQrgIwD2iMjO8mNfVNUHUxxTFn0GwD3lm/9vAXws5fF4UtUnRGQTgKdhpEd3oAlbHIjIDwC8G8AJInIQwE0A1gLYKCIfh3GzusT3Os2ZGSAioiRkKl1DRETVGOSJiDKMQZ6IKMMY5ImIMoxBnogowxjkiYgyjEGeiCjD/j+laLe01jg5JQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.figure(0)\n",
    "plt.xlim(-5, 10)\n",
    "plt.ylim(-5, 10)\n",
    "\n",
    "plt.scatter(dist_01[:, 0], dist_01[:, 1])\n",
    "plt.scatter(dist_02[:, 0], dist_02[:, 1])#, color='red')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us now represent it in a tabular way. We will have dist_01 getting label 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1000\n",
      "500\n"
     ]
    }
   ],
   "source": [
    "r = dist_01.shape[0] + dist_02.shape[0]\n",
    "c = dist_01.shape[1] + 1\n",
    "data = np.zeros((r, c))\n",
    "print(data.shape[0])\n",
    "print(dist_01.shape[0])\n",
    "\n",
    "data[:dist_01.shape[0], :2] = dist_01\n",
    "data[dist_01.shape[0]:, :2] = dist_02\n",
    "data[dist_01.shape[0]:, -1] = 1.0\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now shuffle the data and check by printing the first 10 rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 2.70053391  0.8968841   0.        ]\n",
      " [ 4.24776049  4.28899343  1.        ]\n",
      " [ 0.0259604   0.3846876   0.        ]\n",
      " [ 1.68944854  1.33691757  0.        ]\n",
      " [ 5.20229454  3.59801364  1.        ]\n",
      " [ 0.88110403  4.05051213  1.        ]\n",
      " [ 1.12208367 -0.656618    0.        ]\n",
      " [ 5.0039455   5.19247128  1.        ]\n",
      " [-0.12134508  0.32621837  0.        ]\n",
      " [-0.82679841  1.67429995  0.        ]]\n",
      "[[2.70053391 0.8968841 ]\n",
      " [4.24776049 4.28899343]\n",
      " [0.0259604  0.3846876 ]\n",
      " ...\n",
      " [4.96909499 5.95677514]\n",
      " [3.11352586 2.64212744]\n",
      " [1.15548122 1.08835587]]\n"
     ]
    }
   ],
   "source": [
    "np.random.shuffle(data)\n",
    "print(data[:10])\n",
    "print(data[:, :2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Implementation.** Next, we implement our KNN algorithm. There are many ways to do this, but a basic approach will require a pairwise distance measure for instances, and a way to take a \"training\" dataset of classified instances and make a prediction for a \"test\" data instance. Here is a top-level outline:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [],
   "source": [
    "def distance(x1, x2,p=2):\n",
    "    return (np.sum(np.abs(x1-x2)**p))**(1/p)\n",
    "    \n",
    "def knn(X_train, y_train, xt, k=7):\n",
    "    #TODO\n",
    "    test = np.array([xt,]*X_train.shape[0])\n",
    "    diff = np.subtract(X_train,test)\n",
    "    dist = np.linalg.norm(diff,axis=1)\n",
    "    index = np.argpartition(dist,7)[:7]\n",
    "    k_nn = y_train[index]\n",
    "    #find most frequent item in k nest neighbors\n",
    "    #return_inverse=True： return element from the original array's new index in the new aray\n",
    "    u, indices = np.unique(k_nn, return_inverse=True)\n",
    "    return u[np.argmax(np.bincount(indices))]\n",
    "    #return np.bincount(k_nn).argmax()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now check to see if we can make a prediction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.0\n"
     ]
    }
   ],
   "source": [
    "test_point = np.array([8, -4])\n",
    "# Un-comment the line below and check if it comes out as 0.0  \n",
    "print(knn(data[:, :2][:10], data[:, -1][:10], test_point))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0 1 2] [1 0 0 0 2 1 1]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = np.array([1,0,0,0,2,1,1])\n",
    "u, indices = np.unique(a, return_inverse=True)\n",
    "print(u,indices)\n",
    "u[np.argmax(np.bincount(a))]\n",
    "#print(u,indices)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a train and test split of the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(750, 2) (750,)\n",
      "(250, 2) (250,)\n"
     ]
    }
   ],
   "source": [
    "np.random.shuffle(data)\n",
    "split = int(0.75 * data.shape[0])\n",
    "# print split\n",
    "train_data_X = data[:split, :2]\n",
    "train_data_y = data[:split, -1]\n",
    "test_data_X = data[split:, :2]\n",
    "test_data_y = data[split:, -1]\n",
    "\n",
    "print(train_data_X.shape, train_data_y.shape)\n",
    "print(test_data_X.shape, test_data_y.shape)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Implementation.** Next we need to implement some way to run our KNN classifier on all the test data and get the results. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.012\n",
      "0.988\n"
     ]
    }
   ],
   "source": [
    "def get_acc(kx):\n",
    "    #TODO\n",
    "    error = 0\n",
    "    for idx, j in enumerate(test_data_X):\n",
    "        y_head = knn(train_data_X, train_data_y, j, kx)\n",
    "        if(y_head != test_data_y[idx]):\n",
    "            error += 1\n",
    "        else:\n",
    "            continue\n",
    "    print(error/test_data_y.shape[0])\n",
    "    return 1 - error/test_data_y.shape[0]\n",
    "\n",
    "\n",
    "print(get_acc(7))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What accuracy did you get ? You should get around 99 percent on this dataset. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's try different values of K."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.012\n",
      "k: 2 | Acc: 0.988\n",
      "0.012\n",
      "k: 3 | Acc: 0.988\n",
      "0.012\n",
      "k: 4 | Acc: 0.988\n",
      "0.012\n",
      "k: 5 | Acc: 0.988\n",
      "0.012\n",
      "k: 6 | Acc: 0.988\n",
      "0.012\n",
      "k: 7 | Acc: 0.988\n",
      "0.012\n",
      "k: 8 | Acc: 0.988\n",
      "0.012\n",
      "k: 9 | Acc: 0.988\n",
      "0.012\n",
      "k: 10 | Acc: 0.988\n",
      "0.012\n",
      "k: 11 | Acc: 0.988\n",
      "0.012\n",
      "k: 12 | Acc: 0.988\n",
      "0.012\n",
      "k: 13 | Acc: 0.988\n",
      "0.012\n",
      "k: 14 | Acc: 0.988\n",
      "0.012\n",
      "k: 15 | Acc: 0.988\n",
      "0.012\n",
      "k: 16 | Acc: 0.988\n",
      "0.012\n",
      "k: 17 | Acc: 0.988\n",
      "0.012\n",
      "k: 18 | Acc: 0.988\n",
      "0.012\n",
      "k: 19 | Acc: 0.988\n"
     ]
    }
   ],
   "source": [
    "for ix in range(2, 20):\n",
    "    print (\"k:\", ix, \"| Acc:\", get_acc(ix))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Now let's try real data : MNIST"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import datetime"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Of course, MNIST is image data, but here we are using a CSV version where we can view the pixels as numbers (each row has the pixel data for an image of a digit, and the first column is the class of the digit, i.e., 0-9)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2499, 785)"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('train.csv')\n",
    "df.head()\n",
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since the dataset is quite big, we will just use a subset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(2000, 785)\n"
     ]
    }
   ],
   "source": [
    "data = df.values[:2000]\n",
    "print (data.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make a train/test split of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(1600, 784) (1600,)\n",
      "(400, 784) (400,)\n"
     ]
    }
   ],
   "source": [
    "split = int(0.8 * data.shape[0])\n",
    "\n",
    "X_train = data[:split, 1:]\n",
    "X_test = data[split:, 1:]\n",
    "\n",
    "y_train = data[:split, 0]\n",
    "y_test = data[split:, 0]\n",
    "\n",
    "print (X_train.shape, y_train.shape)\n",
    "print (X_test.shape, y_test.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us just check that our data really does represent images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAP8AAAD8CAYAAAC4nHJkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAADG1JREFUeJzt3X/oXfV9x/Hn28z8oQ1EKdpgs6WWODY1S0eQQXQ4q8WNQoyxUv8YGStNwQor7A/FfyqMgoy1W/8KpCQmQpO2YJyhlrU1jBlxiFFikzazFcnaLDHfiNVYQYrJe398T8q38XvP/eb+Ojd5Px8Q7r3nfe4975zk9f2ce8+5309kJpLquaTrBiR1w/BLRRl+qSjDLxVl+KWiDL9UlOGXijL8UlGGXyrqDya5sYjwckJpzDIzFrLeUCN/RNwZEa9GxGsR8dAwryVpsmLQa/sjYhHwc+AO4CjwInBfZv6s5TmO/NKYTWLkvwl4LTNfz8zfAt8B1g3xepImaJjwXwP8as7jo82y3xMRmyJif0TsH2JbkkZsmA/85ju0+NBhfWZuAbaAh/3SNBlm5D8KLJ/z+OPAseHakTQpw4T/RWBlRHwiIhYDnwf2jKYtSeM28GF/Zn4QEQ8APwQWAdsy86cj60zSWA18qm+gjfmeXxq7iVzkI+nCZfilogy/VJThl4oy/FJRhl8qyvBLRRl+qSjDLxVl+KWiDL9UlOGXijL8UlGGXyrK8EtFGX6pKMMvFWX4paIMv1SU4ZeKMvxSUYZfKsrwS0UZfqkowy8VZfilogy/VJThl4oy/FJRA0/RDRARR4B3gdPAB5m5ZhRNSQD79u1rrW/fvr21vnXr1hF2c/EZKvyNv8rMN0fwOpImyMN+qahhw5/AjyLipYjYNIqGJE3GsIf9azPzWERcBfw4Iv4nM5+du0LzQ8EfDNKUGWrkz8xjze0M8CRw0zzrbMnMNX4YKE2XgcMfEZdHxJKz94HPAIdG1Zik8RrmsP9q4MmIOPs6OzPzP0bSlaSxGzj8mfk68Gcj7KWsSy+9tLV+/fXXt9YPHDgwynYm5tprr22tr169urV+5syZUbZTjqf6pKIMv1SU4ZeKMvxSUYZfKsrwS0WN4lt9GtI999zTWl+1alVr/UI91bd48eLW+mWXXTahTmpy5JeKMvxSUYZfKsrwS0UZfqkowy8VZfilojzPPwXuvvvu1vrJkycn1IkqceSXijL8UlGGXyrK8EtFGX6pKMMvFWX4paI8zz8FNmzY0Frvd57//vvvH2U7E7Ny5cquWyjNkV8qyvBLRRl+qSjDLxVl+KWiDL9UlOGXiup7nj8itgGfBWYy84Zm2ZXAd4EVwBHg3sz89fjavLhFRGv9sccem1An06XffrnkEseuYSxk720H7jxn2UPA3sxcCextHku6gPQNf2Y+C7x1zuJ1wI7m/g7grhH3JWnMBj1uujozjwM0t1eNriVJkzD2a/sjYhOwadzbkXR+Bh35T0TEMoDmdqbXipm5JTPXZOaaAbclaQwGDf8eYGNzfyPw1GjakTQpfcMfEbuA/wb+OCKORsQXgEeBOyLiF8AdzWNJF5C+7/kz874epU+PuJeyMnOo+oVq1apVrfV+f+8zZ86Msp1yvEpCKsrwS0UZfqkowy8VZfilogy/VJS/uludWbJkSdctlObILxVl+KWiDL9UlOGXijL8UlGGXyrK8EtFeZ5/ApYuXdp1Cxekt99+u7W+a9euCXVycXLkl4oy/FJRhl8qyvBLRRl+qSjDLxVl+KWiPM8/AevXr++6hZ6uu+661vott9zSWh/m12dv2LChtb579+7W+vvvvz/wtuXIL5Vl+KWiDL9UlOGXijL8UlGGXyrK8EtFRb9pkCNiG/BZYCYzb2iWPQJ8ETjZrPZwZv6g78YiLs65pvvYt29fa/3mm29urb/yyiut9ZmZmZ6122+/vfW5/UREa73L6cMvucSxaz6Z2f6P1ljI3tsO3DnP8n/NzNXNn77BlzRd+oY/M58F3ppAL5ImaJjjpgci4icRsS0irhhZR5ImYtDwbwY+CawGjgNf77ViRGyKiP0RsX/AbUkag4HCn5knMvN0Zp4BvgXc1LLulsxck5lrBm1S0ugNFP6IWDbn4Xrg0GjakTQpfb/SGxG7gFuBj0bEUeCrwK0RsRpI4AjwpTH2KGkM+oY/M++bZ/HWMfRy0dq5c2drfe3ata31G2+8sbV+6tSpnrWnn3669bmHDrUftG3fvr21Poznn3++tb53796xbVte4SeVZfilogy/VJThl4oy/FJRhl8qyl/dPQGbN29ura9YsaK1fvDgwdb6M88807P2xhtvtD63S6dPn26tv/POOxPqpCZHfqkowy8VZfilogy/VJThl4oy/FJRhl8qyvP8U+DBBx/suoWxWbJkSc/aokWLJtiJzuXILxVl+KWiDL9UlOGXijL8UlGGXyrK8EtFeZ5fY3Xbbbf1rC1dunSCnehcjvxSUYZfKsrwS0UZfqkowy8VZfilogy/VFTf8/wRsRx4HPgYcAbYkpnfjIgrge8CK4AjwL2Z+evxtaoL0YEDB3rW3nvvvQl2onMtZOT/APjHzPwT4C+AL0fEnwIPAXszcyWwt3ks6QLRN/yZeTwzX27uvwscBq4B1gE7mtV2AHeNq0lJo3de7/kjYgXwKeAF4OrMPA6zPyCAq0bdnKTxWfC1/RHxEeAJ4CuZeSoiFvq8TcCmwdqTNC4LGvkj4lJmg//tzNzdLD4REcua+jJgZr7nZuaWzFyTmWtG0bCk0egb/pgd4rcChzPzG3NKe4CNzf2NwFOjb0/SuCzksH8t8LfAwYg4e97mYeBR4HsR8QXgl8DnxtOiLmTLly/vWVu8ePEEO9G5+oY/M58Der3B//Ro25E0KV7hJxVl+KWiDL9UlOGXijL8UlGGXyrKX92tsXruued61k6dOjXBTnQuR36pKMMvFWX4paIMv1SU4ZeKMvxSUYZfKsrwS0UZfqkowy8VZfilogy/VJThl4oy/FJRhl8qyu/zqzOvvvpq1y2U5sgvFWX4paIMv1SU4ZeKMvxSUYZfKsrwS0VFZravELEceBz4GHAG2JKZ34yIR4AvAiebVR/OzB/0ea32jUkaWmbGQtZbSPiXAcsy8+WIWAK8BNwF3Av8JjP/ZaFNGX5p/BYa/r5X+GXmceB4c//diDgMXDNce5K6dl7v+SNiBfAp4IVm0QMR8ZOI2BYRV/R4zqaI2B8R+4fqVNJI9T3s/92KER8B/gv4WmbujoirgTeBBP6J2bcGf9/nNTzsl8ZsZO/5ASLiUuD7wA8z8xvz1FcA38/MG/q8juGXxmyh4e972B8RAWwFDs8NfvNB4FnrgUPn26Sk7izk0/6bgX3AQWZP9QE8DNwHrGb2sP8I8KXmw8G213Lkl8ZspIf9o2L4pfEb2WG/pIuT4ZeKMvxSUYZfKsrwS0UZfqkowy8VZfilogy/VJThl4oy/FJRhl8qyvBLRRl+qahJT9H9JvC/cx5/tFk2jaa1t2ntC+xtUKPs7Y8WuuJEv8//oY1H7M/MNZ010GJae5vWvsDeBtVVbx72S0UZfqmorsO/pePtt5nW3qa1L7C3QXXSW6fv+SV1p+uRX1JHOgl/RNwZEa9GxGsR8VAXPfQSEUci4mBEHOh6irFmGrSZiDg0Z9mVEfHjiPhFczvvNGkd9fZIRPxfs+8ORMTfdNTb8oj4z4g4HBE/jYh/aJZ3uu9a+upkv038sD8iFgE/B+4AjgIvAvdl5s8m2kgPEXEEWJOZnZ8Tjoi/BH4DPH52NqSI+Gfgrcx8tPnBeUVmPjglvT3Cec7cPKbees0s/Xd0uO9GOeP1KHQx8t8EvJaZr2fmb4HvAOs66GPqZeazwFvnLF4H7Gju72D2P8/E9ehtKmTm8cx8ubn/LnB2ZulO911LX53oIvzXAL+a8/go0zXldwI/ioiXImJT183M4+qzMyM1t1d13M+5+s7cPEnnzCw9NftukBmvR62L8M83m8g0nXJYm5l/Dvw18OXm8FYLsxn4JLPTuB0Hvt5lM83M0k8AX8nMU132Mtc8fXWy37oI/1Fg+ZzHHweOddDHvDLzWHM7AzzJ7NuUaXLi7CSpze1Mx/38TmaeyMzTmXkG+BYd7rtmZukngG9n5u5mcef7br6+utpvXYT/RWBlRHwiIhYDnwf2dNDHh0TE5c0HMUTE5cBnmL7Zh/cAG5v7G4GnOuzl90zLzM29Zpam4303bTNed3KRT3Mq49+ARcC2zPzaxJuYR0Rcy+xoD7PfeNzZZW8RsQu4ldlvfZ0Avgr8O/A94A+BXwKfy8yJf/DWo7dbOc+Zm8fUW6+ZpV+gw303yhmvR9KPV/hJNXmFn1SU4ZeKMvxSUYZfKsrwS0UZfqkowy8VZfilov4fpGqYXYdIZwUAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.figure(0)\n",
    "plt.imshow(X_train[91].reshape((28, 28)), cmap='gray', interpolation='none')\n",
    "print (y_train[91])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Implementation.** Now code another ```get_acc()``` and try different values of K on our dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.89\n"
     ]
    }
   ],
   "source": [
    "def get_acc(kx):\n",
    "    error = 0\n",
    "    for idx, j in enumerate(X_test):\n",
    "        \n",
    "        y_head = knn(X_train, y_train, j, kx)\n",
    "        #print(y_head)\n",
    "        if(y_head != y_test[idx]):\n",
    "            error += 1\n",
    "        else:\n",
    "            continue\n",
    "    return 1 - error/y_test.shape[0]\n",
    "\n",
    "print (get_acc(kx=7))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
